All Roads Lead to Philosophy
Examining the ‘Getting to Philosophy’ Phenomena on Wikipedia using Network Analysis
Keywords
- Wikipedia
- Philosophy
- Getting to Philosophy
- First Page Philosophy
- First Link
- Navigability
Abstract
In this study, I analyze a phenomenon on Wikipedia in which repeatedly clicking “first link” of a webpage invariably takes a user to the Philosophy page. I examine the percent of pages on Wikipedia in which this idea holds true in an effort to understand how Wikipedia’s network is structured and what that means for its user navigability and understanding. Previous research indicates that users’ page navigation is heavily focused on the lead of a Wikipedia article, rarely venturing beyond the first paragraph1; therefore, I limit my analysis to the first several links in this section; further analysis with greater computing power could be done on the links within the entire article. Amongst these first several links, I seek to determine if there are any other link locations that reach a specific page with any abnormal frequencies, including the philosophy page. To conduct my analysis, I construct a network using Wikipedia pages as nodes and the links on the page as undirected links between nodes. Since I am focused on reaching the philosophy page, once I reach a page that has already been determined to reach the philosophy page, I move on to another root page. With the network, I examine average path lengths to the philosophy page, the neighbors of the philosophy page that most commonly direct to it, and the nature of the philosophy node itself that results in this phenomenon. My conclusions demonstrate the effectiveness of Wikipedia’s effort to make their introductory sentence and links broad as well as cementing of Philosophy as “the first science”.
Introduction
Wikipedia pages are built with the user’s understanding in mind. To ensure consistency across pages and maintain reliability as a credible source, there are extensive guidelines on the structure of each page. As one of the most important components of a Wikipedia page, linked content and the content of the lead paragraph is tightly monitored. Links serve to “provide instant pathways to locations within and outside the project that can increase readers’ understanding of the topic at hand.”(Wikipedia 2023b) Users will click on links when a topic is unfamiliar to them, or they interested in learning more.
When arriving to a page, a user ought to have the topic explained to them as though they know little to nothing about it. The lead ought to frame the reader so as to “set the scene of the topic”.(Wikipedia 2023b) Wikipedia explains the structure of the lead paragraph:
In Wikipedia, the lead section is an introduction to an article and a summary of its most important contents. It is located at the beginning of the article, before the table of contents and the first heading. It is not a news-style lead or “lede” paragraph.
The average Wikipedia visit is a few minutes long. The lead is the first thing most people will read upon arriving at an article, and may be the only portion of the article that they read. It gives the basics in a nutshell and cultivates interest in reading on—though not by teasing the reader or hinting at what follows. It should be written in a clear, accessible style with a neutral point of view.(Wikipedia 2023b)
Wikipedia goes on to outline how the opening paragraph and sentence ought to be structured. They explain that the “[Opening paragraph] It should establish the context in which the topic is being considered by supplying the set of circumstances or facts that surround it. If appropriate, it should give the location and time.”(Wikipedia 2023b) For example, a building’s first link will most likely be its location. Within that paragraph, its opening sentence is critical for my study as it will contain the first link. Editors are instructed that “the first sentence should tell the nonspecialist reader what or who the subject is, and often when or where.”(Wikipedia 2023b) They go on to provide explicit instructions on what the first linked topic ought to be in an article:
The first sentence should provide links to the broader or more elementary topics that are important to the article’s topic or place it into the context where it is notable.
For example, an article about a building or location should include a link to the broader geographical area of which it is a part.
Arugam Bay is a bay on the Indian Ocean in the dry zone of Sri Lanka’s southeast coast.
In an article about a technical or jargon term, the first sentence or paragraph should normally contain a link to the field of study that the term comes from.
In heraldry, tinctures are the colours used to emblazon a coat of arms.
The first sentence of an article about a person should link to the page or pages about the topic where the person achieved prominence.
Harvey Lavan “Van” Cliburn Jr. (July 12, 1934 – February 27, 2013) was an American pianist who achieved worldwide recognition in 1958 at age 23, when he won the first quadrennial International Tchaikovsky Piano Competition in Moscow, at the height of the Cold War.
Exactly what provides the context needed to understand a given topic varies greatly from topic to topic.(Wikipedia 2023b)
As you can see, the first link of a page will be increasingly broad as you continue to click the first link. These instructions create a picture of how a topic like philosophy can be at the center of Wikipedia’s first link network. Conversely, it is doubtful that such a center exists for another link placement. Even just the second link in an article can be increasingly specific, moving laterally or even backwards in specificity rather than towards larger hubs such as philosophy. Take one of Wikipedia’s examples in Harvey Lavan “Van” Cliburn Jr; his first link path begins with pianist then continues as follows: piano, keyboard instrument, musical instrument, music, art, creativity, psychology, mind, thought, consciousness, awareness, philosophy. With each passing link you can sense that your destiny on the philosophy page grows closer; the topics are broader and the connection from it to philosophy feels increasingly obvious. However, if we were to follow the second link, International Tchaikovsky Piano Competition, we find ourselves on the following path: Saint Petersburg, Russia, Eastern Europe, Ural Mountains, Eurasia, Europe, peninsulas, mainland, continent, regions, Earth’s surface, hemispheres, etc. Unlike with the first link, the second link gets stuck in geographic limbo without ever getting closer to a central topic like Philosophy. I will explore what a second link network looks like further in my analysis.
There is special focus on the very beginning of a Wikipedia page because that is where users devote most of their attention. Dimitrov et. al. utilize click data from Wikipedia’s navigation logs to construct a heat map of where users are clicking the most on Wikipedia pages. The heat map illustrates two clear dark red, high density, lines at the beginning of the page directly where the lead is located, demonstrating that users highest click rate is on links within the first few lines of the opening paragraph. The rest of the page is sparse beyond a preference for links on the left side of pages, a phenomenon the authors themselves do not fully understand.(Dimitar Dimitrov 2016) However, the high click rate within the lead indicates to us that understanding the nature of the network of the first few links in an article is indicative of the nature of the network that users are typically interacting with.
Research has already been done into the size of the Giant Connected Component (GCC) of nodes that connect to the philosophy node. In a study of Wikipedia’s navigability by language, as of 2017, 97.0% of pages in English will reach the philosophy page(Daniel Lamprecht 2016), a slight increase of around 2.5% since 2011.(Wikipedia 2023a) These numbers fluctuate across languages, with some languages have a center on pages such as Psychology in Spanish or Person in Japanese each with varying sizes but still having the majority of nodes reach these pages2; my study will only be focused on the English network of Wikipedia. In the future, it would be interesting to study this phenomenon in other languages as I have done with English. In particular, previous studies indicate that Dutch has the smallest GCC with just 67.0% of nodes in its GCC.(Daniel Lamprecht 2016) I would like to compare its network to English to understand this discrepancy.
If you would like to see how this network is formed beyond clicking through Wikipedia webpages on your own, the online page xefer will quickly build out a network of pages and their first links until you reach the philosophy page. This is a helpful tool that is good to visualize what this can look like in practice. However, it was designed to always reach the philosophy page even for those pages that manage to avoid the philosophy page. It does this by skipping to the second link on a page when it realizes it will not be able to reach the philosophy page through the first link.(xefer 2011) Therefore, we need to construct our own network if we want to understand these disconnected nodes.
To understand how a node can be disconnected, we ought to look at what makes philosophy the center of the network. If you click on the first link on the philosophy page, you will find yourself back on the philosophy page in 6 clicks. This self-loop forms a bottom of sorts to the network as nothing beyond the 5 pages you reach from the philosophy page can be found from there. Amongst those 5 pages, as I will later show, philosophy is by far the largest node by density, making it the logical choice for the center. For another node to avoid the philosophy node, it would require a similar cycle. Therefore, it is going to be a broad topic as it has to be something that could similarly be in the first sentence of a Wikipedia page. This eliminates super specific pages from consideration despite them being the intuitive guess for what might manage to avoid philosophy.
A page’s neighbors will remain within semantically related to that page amongst links in the lead. In a study that constructed Wikipedia’s network using the first ten links in an article as a node’s edges, it was determined that the nodes will form into communities of semantically related terms.(Neven Matas 2015) For example, the mathematics page will be in a community of other topics related to math such as physics. For our sakes, this is an important result as it helps to paint a picture of what the branches stemming from philosophy’s neighbors will look like. For example, we can now expect all scientific terms to be connected in communities allowing them all to pass through the science page on their way to the philosophy page.
Beyond some of the quicker results such as the size of the GCC, the average path length to philosophy, the number of disconnected components, and the nature of networks from other link locations, I will also look extensively at the neighbors of the philosophy node. If the philosophy node is removed from the network, how large is the remaining GCC and what is its largest node? I hypothesize that the awareness node and its connecting parts will form the basis of the GCC and that the network will not shrink by more than 10%. However, that if awareness were to be removed as well, the GCC would shrink dramatically as the awareness node serves as a bridge between all scientific topics and all locations-based topics (buildings, monuments, historical figures). Finally, I will test whether distance from the philosophy node and node density can be demonstrated to be statistically significant in relation as I would expect due to the generality of the topics.
Methods
All of my analysis and data collection was done using Python 3.10.12.
Finding First Links
get_first_link(page_url)
By far the most difficult task was writing the function that would find the first (or second) link on a wikipedia page. What is an incredibly easy task for the human eye proved to be quite difficult to program. If the task was to get the first linked content on a page that would be quite easy. However, the more literal phenomena occurs with the first link to another wikipedia page in the content section of the page that is not in parenthesis and not a citation. To find this took a lot of trial and error as Wikipedia pages vary far more than you might think.
I first tried to use the Wikipedia API. Its links attribute would have seemed to be an easy way to grab the links on a page. However, there is no functionality to get the links in order of appearance; instead, they are returned alphabetically. I briefly investigate ways to figure out which of these links came first by parsing the HTML but quickly found that it would just be easier to do the entire thing using the HTML.
To read the wikipedia pages, I used the Requests and Beautiful Soup libraries. I then found all of the paragraph content on the Wikipedia page. I decided to excluded list components (bullet points) from my analysis as I felt they did not meet the same criteria as a link within a paragraph. This means that pages like History of the Administrative Divisions of China or 1965 Palanca Awards will have ‘no links’ as their are no links in their primary content. In future analysis, I hope to include these links and compare the results to see which is a better measure.
From there, I found all of the hyperlinks in each paragraph, then got the href and class for each link. I used these to filter out any “bad” links such as citations, files, links that leave Wikipedia, and the most challenging, links within text parenthesis. This was a difficult decision as sometimes it would seem that the text here is meaningful. For example, the Creativity page has six parenthetical links before you reach the first link. However, most links inside parentheses are for translations and other self-referring content as you can see on the Ancient Greece page; referring to Greek here is wrong as that is in reference to the translation, not the content of the page itself. Therefore, parenthetical links were excluded using the isValid(ref, paragraph) function from Christopher Chiche on Stack Overflow.
I then built a list of links and grabbed the first one. Typically, these href’s were structured as /wiki/href. To get cleaner node names, I removed the /wiki/ as it would be repetitive to see it as a prefix on every single page. If there were no functional links on a page, it served as a dead-end for the network even though it may not be under Wikipedia’s definition of a Dead-End page. This effectively only applied for Disambiguation Pages. If an AttributeError or TypeError occurs, which is rare, a unique string “!FAIL!:” is added to the front of the url to be detected later so as not to be confused with successful pages.
Finally, to find the second link on a page, I used a nearly identical function that returned the second item in the list or, if there was only one link, no links at all.
Creating the First-Link Network
network_expander(G, page_url, seen_pages, is_root, fails, disconnects,convergence_df, new_pages=100)
This function is used to create or expand the network. This is done using a Breadth-First Search (BFS). It takes in a lot of variables but many of those are just set as empty lists. It is primarily there to give the option of expanding the network in multiple steps rather than one giant run-through as it takes quite some time to run.
The function first checks if there have been any previous iterations or if it is starting new. It also has a list of “Notable Nodes” that I have manually set. These are the nodes that I have found in my analysis to be the most important (central) and therefore want to track their centrality to ensure the network has converged so that we can make claims on the centrality of these nodes despite not encompassing all of the pages of Wikipedia. Additionally, the function monitors the average page distance from the philosophy page and the size of the network’s weakly connected Giant Connected Component (GCC) as I will explain shortly. Finally, the function finds the name of the first page it will look at by splitting its url.
Then, the bulk of the function occurs in a for loop. Each time through the loop adds a new “seed page” to the network. Meaning, it starts at a new page and works its way towards the philosophy page or, to another page that loops back to itself. The new_pages parameter determines how many times this loop runs. For my final network, I set this to 50,000. However, this does not mean there are 50,000 pages in the network. Rather, there are 50,000 pages plus all of the pages in between those seed pages and the philosophy page, resulting in INSERT FINAL NUMBER HERE pages. With greater time and computing power, I would like to conduct a larger analysis, however, all of the values I discuss would not change in any significant way as demonstrated in the convergence section of my analysis.
Each loop starts with the url of its seed page (page_url). For all but the first page, these pages are found using wiki_random_page(seen_pages). This function uses a while loop that ends when the function finds a new random page. It knows it is new if it is not in the input parameter, seen_pages, which contains a list of every previous page that the function has seen. It then uses the Wikipedia API’s random function to select a random wikipedia page. The function then checks that it has not seen that page before. Then, it avoids two types of pages:
List Pages: These pages often do not contain any actual information and are just lists of other Wikipedia pages. While some would work for the network, many are unnecessary and lack any links in their primary content, creating issues for the network. See List of painters by name beginning with “P” as an example. These are not pages that would impact Wikipedia’s navigability and therefore we can exclude them as seed pages. They are not skipped if they are the first link on a page which can occur (e.g. Sitting).
Disambiguation Pages: These pages were a much easier decision to skip as they contain no information. They serve to point users to actual pages when their search term was too vague. Additionally, they all lack a first link and would skew statistics such as the size of the GCC. See Category: Disambiugation Pages for more information and the Art Disambiguation Page as an example.
Finally, wiki_random_page creates a proper page url by replace spaces with underscores and breaks the while loop. The function then returns the random_page name and its url, page_url.
It then gets the first link on that page using get_first_link(page_url=page_url). It then double checks that get_first_link returned a string. If it did not, and returned a NoneType instead, there are two options: 1. If that page is a seed page, it picks a new seed page and starts over. 2. If it is the first link of a different seed page, its url is added to the fails list which is manually checked at the end to repair any issues. These are, however, very rare (about 1 in 10,000).
Next, it normalizes the formatting for the first link by making it all lowercase. This is stored as a separate variable as capitalization is [case sensitive in Wikipedia Urls](https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking#:~:text=(Wikipedia%20article%20titles%20almost%20always,characters%20after%20the%20initial%20one.). For the network however, the capital p Philosophy page should be no different than the lowercase p philosophy page. Therefore, all nodes are lowercase.
Then, it checks for if the unique fail string, “!FAIL!:” as discussed above in the string. If it is there, it follows the same procedure as if the function returned a NoneType, but instead adds the first link to the fails list.
If there is a dead end node, that is added to a list of disconnects, then a new seed page is found using wiki_random_page(seen_pages).
If it makes it through all of those checks, which most pages do, it is added as a node. Additionally, if it is not the seed page, an edge is added from the previous page to its first link.
Then, once all of the Notable Nodes are in the network (which should happen after just a few iterations), at every 1/100th of the total network size, centrality measures of Betweeness Centrality, Closeness Centrality, and In-Degree Centrality are calculated for each of the notable nodes using NetworkX Centrality functions as well as the average distance from the philosophy page and the size of the GCC which are calculated manually. These are then organized into a row of a Pandas DataFrame called convergence_df. Out-Degree centrality was excluded because the out-degree of every page is 1, making all of their Out-Degree centralities identical. Additionally, eigenvector centrality was ignored as the idea that high degree nodes would be connected to one another is doubtful in this network. There is nothing to suggest that having a common first link makes that page itself also common. Given that the underlying assumption behind eigenvector centrality is not met by this network, it was not worth tracking. These are the values we will track to ensure the DataFrame is large enough that these values are no longer changing. This is by far the most time consuming part of the function. Once the network reaches a large enough size, these calculations can take several minutes, hence why they are only done 1% of the time to maximize efficiency.
Finally, the first link is checked to determine if it is the philosophy page. In which case, we can now move on to a new seed page as we know its outcome. Then, it is checked to see if it returned to the seed page, meaning it looped back to itself; these are added to the list of disconnects. Most disconnects, however, are found in more central pages and have to be found later looking at Smaller Connected Components. They are then checked to see if they have already been visited, in which case we know their eventual outcome and can select a new seed page. Finally, if none of these conditions are met, it is added to the seen pages and searched for its own first link. This process continues until one of the previous criteria is met.
After the loop is completed, it returns the Network (G), its seen_pages, fails, disconnects, and the Convergence DataFrame (convergence_df) to be analyzed.
Incidentally, the function can process roughly 1000 seed pages every 10 minutes, however, this slows down as the network expands due to the convergence calculations. Hence, the network size is limited.
I then manually check and fix any failed pages to complete the network and it is saved to a gml and the convergence data to a csv file.
Creating the Second-Link Network
second_link_network_expander(G, page_url, seen_pages, is_root, fails, disconnects, convergence_df, new_pages=100)
The second link network was created in a near identical fashion with a couple of key differences. First, it uses get_second_link(page_url) to gather the pages second links, rather than their first. Additionally, since it is more difficult to know the “notable” pages, these are selected by finding the top 3 pages in each centrality measure. Since we want to wait until a critical size to look into this, it does not start until the network has at least 3000 seed pages. There is also no reason to check each pages distance from the philosophy page in this network so that convergence measure has been eliminated. Similarly, it no longer stops looking for new pages at a specific page like it did for the philosophy page; now, it continues until it hits a page that we have already visited.
I chose not to investigate any other link locations for a few reasons. First, as the link location grows, it becomes increasingly unlikely that a page has that many links on it, shrinking the network. Second, no patterns presented themselves within the second-link network that seemed necessary to investigate at other locations. And finally, as previously discussed, the opening line(s) of a Wikipedia article have signicantly more guidance onto their structure and links (Wikipedia 2023b). Link locations beyond these are unlikely to be more than a random assortment of pages with no notable patterns, however, work would have to be done to prove this claim.
Plotting Methods
To create the plots needed for my analysis, I used MatPlotLib, Seaborn, and NetworkX’s Drawing Tool.
Results
First Link Network
Convergence
First, it is important to demonstrate that the size of the network is sufficient to make claims as to the nature of this phenomena. Beginning with the centrality measures of the most important nodes in the network, we can see that they have all leveled off and that any additional nodes would only not change their values in any statistically significant way.
In the left column, you can see the centrality measure across the entire network construction while the right column features the last iterations of the network construction to give a “zoomed in” view of the measures. All of them are almost entirely flat, with slopes that are less than 0.001.
Next, we can see that the average distance away from the philosophy page also flattened out with the networks expansion.
While it is not quite as flat as the convergences, we can still see strong evidence that it has settled to an approximate value of INSERT HERE.
Finally, we see an interesting negative slope in the size of the GCC of the network as new nodes are added.
It will take further research to understand why this negative slope exists, however, we see it settle around 87% of the total network. This is a sharp drop from the previous value of 97% (Daniel Lamprecht 2016). There are a couple of reasons for this. First, I would expect that they included lists in their analysis. More importantly, the 97% figure cited by Wikipedia is not actually the number of pages that reach the Philosophy page from the first link network. Rather, it is “the percentage of articles which eventually lead to a cycle when repeatedly following first links.” (Daniel Lamprecht 2016) There are several of these cycles which occur without ever connecting to the Philosophy page. For example, the first link on the money page is payment. Then, the first link on the payment page is money. This loop blocks several nodes from ever reaching the philosophy page. Similar loops occur on the name, accounting, and candidate pages to name a few. This difference in methodology likely makes up for the near difference 10% difference here. Lamprecht et al. do list a figure for the amount of pages that link directly to the philosophy page of 92.1%. This much smaller difference is likely the result of including list links, the size of our networks, their use of the Wikipedia API, and changes in Wikipedia’s network structure over the past seven years. Due to too many interacting variables, it is difficult to make a strong claim as to the true difference in the size of this component.
Notable Nodes and Paths to Philosophy
As was discussed earlier, there were several nodes that became apparent as the most important in the network. These nodes ‘funneled’ pages into the philosophy page and thus boasted the largest centrality measures in the network. This can perhaps best be scene by visualizing the nodes closest to the philosophy node using the force-directed Kamada-Kawai Layout.
These plots help visualize how nodes flow towards the philosophy page. You can see that it has several low degree neighbors in addition to some of these huge hubs. Within my search, I found 31 neighbors of the philosophy page but there are surely countless others that were not found within my search. For example, Immanuel Kant’s page’s first link is philosophy but it is unlikely to be the first link on more than a handful of other pages, making its chances of being found in a network of this size incredibly small. The same likely goes for numerous philosophers and adjacent topics. The neighbors are listed below by degree:
| Node | Degree | |
|---|---|---|
| 1 | political_philosophy | 15 |
| 2 | specialty_(medicine) | 7 |
| 3 | modernism | 6 |
| 4 | aesthetics | 6 |
| 5 | medical_specialty | 5 |
| 6 | philosophy_of_culture | 3 |
| 7 | ethics | 3 |
| 8 | awareness | 3 |
| 9 | aesthetic | 3 |
| 10 | platonism | 2 |
| 11 | object_(philosophy) | 2 |
| 12 | post-structuralist | 2 |
| 13 | philosophies | 2 |
| 14 | medical_speciality | 2 |
| 15 | american_enlightenment | 2 |
| 16 | political_theory | 2 |
| 17 | natural_philosophy | 2 |
| 18 | moral_philosophy | 2 |
| 19 | naturalism_(philosophy) | 2 |
| 20 | philosophy_of_logic | 2 |
| 21 | philosophy_of_science | 2 |
| 22 | outline_of_philosophy | 2 |
| 23 | subjectivity | 2 |
| 24 | philosophical_tradition | 2 |
| 25 | metaphysics | 2 |
| 26 | humanism | 1 |
| 27 | educational_philosophy | 1 |
| 28 | deist | 1 |
| 29 | neuroethics | 1 |
| 30 | rationalism | 1 |
| 31 | age_of_enlightenment | 1 |
However, this order does not represent the average path to philosophy. More typically, pages will end up in one of a few specific paths to philosophy. For most disciplines in arts, sciences, or technology, they will most often end up on the science page, leading to the knowledge page, then awareness, before philosophy. Awareness is by far the largest neighbor of Philosophy. Its closeness centrality just barely tails philosophy for the second largest in the network. Those two are followed by knowledge and science, respectively. Below are the full values:
| Node | Closeness Centrality | |
|---|---|---|
| 1 | philosophy | 0.0727122 |
| 2 | awareness | 0.0700715 |
| 3 | knowledge | 0.0590686 |
| 4 | science | 0.0447865 |
| 5 | geography | 0.0254912 |
| 6 | continent | 0.0226938 |
| 7 | mind | 0.0218818 |
| 8 | state_(polity) | 0.0211012 |
| 9 | psychology | 0.0206636 |
| 10 | thought | 0.0196981 |
The Awareness node is so central, in fact, that when you remove the Philosophy node from the network, severing Awareness from all of the other paths to philosophy, the GCC is still INSERT PERCENTAGE HERE of the network, centered on the awareness page. The next largest component of this network is INSERT PAGE HERE.
Additionally, many technical, particularly foreign origin words, will go to their language of origins page. These are then directed to the language page which eventually reaches the Philosophy of Logic page, which then hits the philosophy page. Unfortunately, these nodes are too far from the Philosophy page to be visualized.
There are also several pages with high in-degree centralities that are not seen in these plots. Below are the largest nodes by Degree and In-Degree centrality and their paths to philosophy:
| Node | Degree | In-Degree Centrality | Path to Philosophy | |
|---|---|---|---|---|
| 1 | county_(united_states) | 200 | 0.00840372 | [‘county_(united_states)’, ‘united_states’, ‘north_america’, ‘continent’, ‘geography’, ‘science’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 2 | association_football | 158 | 0.00663007 | [‘association_football’, ‘team_sport’, ‘sport’, ‘physical_activity’, ‘exercise’, ‘human_body’, ‘human’, ‘species’, ‘biology’, ‘science’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 3 | public_university | 128 | 0.00536318 | [‘public_university’, ‘university’, ‘educational_institution’, ‘education’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 4 | u.s._state | 109 | 0.00456081 | [‘u.s._state’, ‘united_states’, ‘north_america’, ‘continent’, ‘geography’, ‘science’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 5 | family_(biology) | 94 | 0.00392736 | [‘family_(biology)’, ‘taxonomic_rank’, ‘biology’, ‘science’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 6 | county_seat | 78 | 0.00325169 | [‘county_seat’, ‘seat_of_government’, ‘government’, ‘state_(polity)’, ‘politics’, ‘decision-making’, ‘psychology’, ‘mind’, ‘thought’, ‘consciousness’, ‘awareness’, ‘philosophy’] |
| 7 | tennis | 74 | 0.00308277 | [‘tennis’, ‘list_of_racket_sports’, ‘game’, ‘play_(activity)’, ‘recreational’, ‘leisure’, ‘time’, ‘sequence’, ‘mathematics’, ‘knowledge’, ‘awareness’, ‘philosophy’] |
| 8 | rock_music | 69 | 0.00287162 | [‘rock_music’, ‘genre_(music)’, ‘music’, ‘the_arts’, ‘creativity’, ‘psychology’, ‘mind’, ‘thought’, ‘consciousness’, ‘awareness’, ‘philosophy’] |
| 9 | capital_city | 69 | 0.00287162 | [‘capital_city’, ‘municipality’, ‘administrative_division’, ‘sovereign_state’, ‘state_(polity)’, ‘politics’, ‘decision-making’, ‘psychology’, ‘mind’, ‘thought’, ‘consciousness’, ‘awareness’, ‘philosophy’] |
| 10 | administrative_division | 65 | 0.0027027 | [‘administrative_division’, ‘sovereign_state’, ‘state_(polity)’, ‘politics’, ‘decision-making’, ‘psychology’, ‘mind’, ‘thought’, ‘consciousness’, ‘awareness’, ‘philosophy’] |
As you can see, many relate to geography. The outlier here seems to be the Association Football page which appears here due to a shockingly large number of football (soccer) pages. Whether this is due to random chance in the search or if football pages make up a large portion of Wikipedia’s network is impossible to say due to our sample size, however, association_football showed up consistently as one of the largest nodes by in-degree during the data collection process.
Finally, the furthest page from philosophy, that reached it, was INSERT HERE and its path can be seen below.
Second Link Network
Again, you can see that the network’s largest nodes all converge, showing little change in their centrality measures.
Conclusions
The goal is to summarize & wrap-up the report or paper. It explains what was found, in a way that would make sense to a general readership.
This area is non-technical. Technical descriptions of what you did belong in the methods sections, while technical results belong in the results sections, not conclusions.
The Conclusions should focus on key and important findings and how these findings affect real-life and real people.
Some say that the Conclusions are the most difficult to write. If you do not understand what you really did, how can you explain it to others? Being able to make technical results and complex models use-able to normal humans (like managers, CEOs, Deans, clients, etc.) is critical in data science. The Conclusions area is important and if it is not good, many points can be lost.
A conclusion is an important part of the paper; it provides closure for the reader while reminding the reader of the contents and importance of the paper. It accomplishes this by stepping back from the specifics in order to view the bigger picture of the document. In other words, it is reminding the reader of the main argument [source Links to an external site.]
For most papers, it is usually a few paragraphs that simply and succinctly restates the main ideas and arguments, pulling everything together to help clarify the thesis of the paper. A conclusion does not introduce new ideas; instead, it should clarify the intent and importance of the paper. It can also suggest possible future research on the topic [source Links to an external site.]